Skip to content

SpilloverDiD conley + survey + lag>0 via panel-block composition (Wave E.2 follow-up)#477

Merged
igerber merged 4 commits into
mainfrom
spillover-conley-wave-e2-followup-lag
May 21, 2026
Merged

SpilloverDiD conley + survey + lag>0 via panel-block composition (Wave E.2 follow-up)#477
igerber merged 4 commits into
mainfrom
spillover-conley-wave-e2-followup-lag

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 21, 2026

Summary

Extends the panel-aware stratified-Conley spatial sandwich (Wave E.2 cross-sectional, PR #474) to conley_lag_cutoff > 0 by adding a within-PSU serial Bartlett HAC term (Newey-West 1987 separable form). The composition meat = meat_spatial + meat_serial has disjoint index sets, exactly matching the no-survey panel-block decomposition at diff_diff.conley._compute_conley_meat.

  • New sibling helper _compute_stratified_serial_bartlett_meat in diff_diff/two_stage.py (T=1 short-circuit, three-mode singleton-stratum branching, panel-wide FPC, panel-wide dense time codes, zeroed centering for singleton-active-period cells)
  • Orchestrator _compute_stratified_conley_meat extended with conley_lag_cutoff kwarg; spatial loop unchanged; serial helper called after when L>0
  • Post-resolution fail-closed gate at SpilloverDiD.fit for no-effective-PSU + lag>0 (fires AFTER _inject_cluster_as_psu so the documented cluster=<col> injection surface continues to work)
  • 24 new tests across two follow-up test classes (aggregate + event-study)

Methodology references

Documented synthesis of:

  • Method names: stratified-Conley panel-block sandwich = spatial (Wave E.2) + serial Bartlett HAC
  • Paper / source links:
    • Conley (1999) "GMM Estimation with Cross Sectional Dependence" — spatial kernel
    • Newey-West (1987) "A Simple, Positive Semi-Definite HAC Covariance Matrix" — serial Bartlett kernel weights (1 - |t-s|/(L+1))
    • Binder (1983) "On the Variances of Asymptotically Normal Estimators from Complex Surveys" — FPC factor form
    • Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder TSL composition with two-stage IF
    • Wave D Gardner GMM correction (Butts 2021 §3.1 + Gardner 2022 §4) on SpilloverDiD's ring-indicator stage-2 design
  • Intentional deviations:
    • Serial term uses per-period within-stratum centering (Binder TSL form), NOT raw scores like the no-survey panel-block reference at conley.py:949-965. Documented in REGISTRY ("Centering asymmetry vs no-survey reference"): the no-survey path assumes E[scores] = 0 so centering is a no-op; survey-weighted Binder TSL needs explicit centering or it inflates variance by twice the squared per-period stratum mean.
    • FPC for the serial term uses panel-wide n_h_panel per stratum, NOT per-period n_h_t. Standalone Newey-West composition on stratified clusters — the serial sum is a panel-level construct so the cluster set is panel-wide. Spatial term keeps its existing per-period FPC unchanged.
    • Requires an effective PSU (explicit survey_design.psu OR cluster=<col> injected as PSU per Wave E.1's _inject_cluster_as_psu). No-effective-PSU survey designs raise NotImplementedError per feedback_no_silent_failures (pseudo-PSU = obs-index fallback would silently zero the serial sum). Tracked in TODO.md.

Full details in docs/methodology/REGISTRY.md section "Variance (Wave E.2 follow-up - conley_lag_cutoff > 0 panel-block composition via spatial + serial Bartlett HAC)".

Validation

  • Tests added/updated: tests/test_spillover.py (24 new test methods across TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff and TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy). Existing test_j0_panel_conley_lag_cutoff_rejected_under_survey (Wave E.2-era gate assertion) deleted.
  • Coverage: lag=0 strict bit-identity to shipped Wave E.2 (mock-spy + meat parity), raw-vs-centered hand-check, L=1 + L=2 hand-computation methodology anchors, AR(1) DGP behavioral SE inflation (rho=0.7, > 5%), cross-stratum independence, panel-wide dense time codes on unbalanced panel, singleton-adjust FPC skip, all-singleton saturation NaN-fail, singleton-active-period centering zeros, no-effective-PSU rejection, cluster-injected-PSU positive surface with SE parity vs explicit PSU, fit idempotency, drift goldens, event-study mirror on both is_staggered branches.
  • Backtest evidence: full _scratch/wave_e2_followup_smoke.py hand-computation anchor for the methodology composition.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

…e E.2 follow-up)

Extends the panel-aware stratified-Conley spatial sandwich (Wave E.2 cross-
sectional, PR #474) to `conley_lag_cutoff > 0` by adding a within-PSU serial
Bartlett HAC term (Newey-West 1987 separable form). The composition
`meat = meat_spatial + meat_serial` has disjoint index sets, exactly matching
the no-survey panel-block decomposition at
`diff_diff.conley._compute_conley_meat`.

Methodology — documented synthesis of:
- Conley (1999) spatial-HAC
- Newey-West (1987) serial Bartlett kernel weights `(1 - |t-s|/(L+1))`
- Binder (1983) / Gerber (2026) Prop 1 stratified TSL on Wave D Gardner GMM
  influence functions

Serial term uses per-period within-stratum centering (Binder TSL form,
matching the spatial helper); panel-wide per-stratum FPC (the serial sum is a
panel-level construct, so the cluster set is panel-wide); hardcoded Bartlett
serial kernel regardless of `conley_kernel` (mirrors `conley.py:951-965`);
panel-wide dense time codes for lag math (matches `conley.py:940` R deviation).

Supported surface — requires an effective PSU: either an explicit
`survey_design.psu` OR a `cluster=<col>` argument that gets injected as the
effective PSU per Wave E.1's `_inject_cluster_as_psu` routing. No-effective-PSU
survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise
`NotImplementedError` post-resolution at `SpilloverDiD.fit` per
`feedback_no_silent_failures`: the pseudo-PSU = obs-index fallback would
silently zero the serial sum (each pseudo-PSU appears in exactly one period).
Routing the serial loop to `conley_unit` would mix IF allocators with the
spatial term and is queued as a follow-up.

Code changes:
- New sibling helper `_compute_stratified_serial_bartlett_meat` in
  `diff_diff/two_stage.py` (T=1 short-circuit, three-mode singleton-stratum
  branching with FPC inside the multi-PSU block to avoid divide-by-zero,
  panel-wide mean for `lonely_psu='adjust'`, zeroed centering for
  singleton-active-period cells so raw scores don't leak into the serial
  Bartlett cross-products under unbalanced panels)
- Orchestrator `_compute_stratified_conley_meat` extended with
  `conley_lag_cutoff` kwarg; spatial loop unchanged; serial helper called
  after spatial loop when `L > 0`
- Dispatch in `_compute_gmm_corrected_meat` conley branch threads
  `conley_lag_cutoff` through
- `spillover.py:2210` Wave E.2-era `NotImplementedError` gate for lag>0 +
  survey deleted; replaced with post-resolution fail-closed gate that fires
  only when `resolved_survey_fit.psu` is None AFTER cluster injection (so
  the documented `cluster=<col>` injection surface continues to work)

Tests — 24 new methods across two classes
(`TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` and
`TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy`):
- `test_a` lag=0 strict bit-identity to shipped Wave E.2 meat
- `test_a2` lag=0 does NOT invoke serial helper (mock-spy)
- `test_b` lag=1 invokes serial helper exactly once (mock-spy)
- `test_c0` raw-vs-centered hand-check pins Binder TSL centering
- `test_c1`/`test_c2` hand-computation methodology anchors at L=1 and L=2
- `test_c3` AR(1) DGP serial inflation behavioral pin (rho=0.7, > 5%)
- `test_d` single-stratum lag=1 finite output
- `test_e` cross-stratum independence of serial term (partition + sum)
- `test_f` singleton-adjust + lag=1 no divide-by-zero
- `test_f2` all-singleton-remove + lag=1 returns zero meat
- `test_g` unbalanced panel + panel-wide dense time codes (hand-computed)
- `test_g2` lag > T-1 well-defined
- `test_h` singleton-active-period centering zeros (sparse-period regression)
- `test_j` no-survey panel-block conley unchanged after gate relaxation
- `test_k` replicate-weight rejection still fires
- `test_l` cluster + lag=1 + survey warn-and-use-PSU
- `test_m` fit-idempotency under lag=1 + survey
- `test_n`/`test_n2` no-effective-PSU survey + lag>0 raises NotImplementedError
- `test_n3` cluster-injected effective-PSU surface fits + matches explicit PSU
- `test_r` drift goldens at lag=1 vs lag=0 (ATT invariant, SE differs)
- `test_o`/`test_p`/`test_r` event-study mirror (both is_staggered branches)

Existing `test_j0_panel_conley_lag_cutoff_rejected_under_survey` (Wave E.2-era
gate-assertion) deleted.

Docs:
- REGISTRY `Variance (Wave E.2 follow-up)` subsection with documented-
  synthesis framing + cross-references + effective-PSU restriction
- `spillover.rst` Wave E.2 follow-up stanza
- CHANGELOG `[Unreleased]` bullet
- `llms.txt` + `README.md` catalog entries updated
- `references.rst` adds Newey-West (1987)
- TODO row deleted (old deferral); new row added for the no-effective-PSU
  follow-up tail

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • No unmitigated P0/P1 issues found in the new SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...) path.
  • The core methodology choices for the new panel-block survey variance path are documented in docs/methodology/REGISTRY.md, and the unsupported no-effective-PSU surface is fail-closed and properly tracked in TODO.md.
  • P2: the new survey panel-block meat path drops the Conley non-PSD warning surface that the registry says should exist on GMM-corrected Conley paths.
  • P3: public docs still advertise the new support too broadly; they omit the effective-PSU restriction that SpilloverDiD.fit now enforces.
  • P3: some reduction/test wording overstates what is actually implemented or pinned.

Methodology

  • Severity: P2. docs/methodology/REGISTRY.md:L3229-L3231 says Conley non-PSD warnings apply on GMM-corrected paths, and diff_diff/conley.py:L972-L980 implements that on the shared no-survey Conley path. The new survey panel-block path composes meat = meat_spatial + meat_serial in diff_diff/two_stage.py:L806-L841 and computes the serial term in diff_diff/two_stage.py:L844-L1113, but never re-runs a final finite/eigenvalue check on the combined survey meat before SpilloverDiD.fit turns diagonals into SEs via sqrt(max(vcov[ii], 0)) at diff_diff/spillover.py:L3292-L3316. Impact: an indefinite survey panel-block vcov can now reach users without the warning the methodology registry promises. Concrete fix: after adding the serial term, apply the same non-finite guard and negative-eigenvalue warning that _compute_conley_meat applies, but on the final combined survey meat matrix.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3. The new serial Bartlett implementation in diff_diff/two_stage.py:L1099-L1107 duplicates the panel time-kernel logic already implemented in diff_diff/conley.py:L949-L965. Impact: survey and no-survey Conley paths can drift on diagnostics and kernel behavior; the missing PSD warning above is already one example. Concrete fix: factor out the shared panel serial-kernel/post-check logic, or centralize final meat validation in one helper.

Tech Debt

  • Severity: P3-informational. The remaining no-effective-PSU gap is fail-closed in diff_diff/spillover.py:L3089-L3131 and explicitly tracked in TODO.md:L142. Impact: this does not block the PR; the unsupported surface is documented and does not silently return wrong numbers. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. SpilloverDiD.fit now requires an effective PSU for survey_design + vcov_type="conley" + conley_lag_cutoff > 0 (diff_diff/spillover.py:L3089-L3131), and the registry documents that restriction at docs/methodology/REGISTRY.md:L3299-L3303, but the public docs still advertise the feature more broadly in README.md:L109, diff_diff/guides/llms.txt:L61, and docs/api/spillover.rst:L329-L396 without that caveat. Impact: weights-only / strata-only survey users can hit an unexpected NotImplementedError despite the public docs saying the feature is supported. Concrete fix: add the effective-PSU restriction everywhere the new support is announced.
  • Severity: P3. The methodology/test wording overstates the implemented reduction and regression surface: the code uses centered PSU scores for both survey terms (diff_diff/survey.py:L2031-L2043, diff_diff/two_stage.py:L1005-L1110), but the new docs/changelog still describe single-stratum cases as “plain” Conley/Newey-West reductions (docs/methodology/REGISTRY.md:L3292-L3293, docs/api/spillover.rst:L370-L374, CHANGELOG.md:L13). Separately, tests/test_spillover.py:L6483-L6529 says it pins the “full meat matrix” at lag=0, but it only asserts att and se. Impact: the documented contract is harder to audit than the implementation. Concrete fix: reword those reductions to centered PSU totals plus the survey factor, and tone down the test/changelog language unless the meat matrix itself is exposed and asserted.

Execution note: this was a static diff review; I could not run the added tests in the provided environment because pytest and numpy were unavailable.

P2 (Methodology — missing PSD/finite warning on combined survey meat):
mirror `_compute_conley_meat`'s finite + negative-eigenvalue guard on
the combined `meat = meat_spatial + meat_serial` returned by the survey
panel-block orchestrator. Both the radial 1-D Bartlett spatial kernel
AND the Newey-West Bartlett serial kernel are practitioner specializations
that are NOT formally PSD-guaranteed; adding two non-PSD-guaranteed terms
can produce a more indefinite combined meat, so the diagnostic surface
matters more on the panel-block path than the no-survey baseline. Guard
fires after the saturation NaN-fail check (so genuinely-saturated meats
NaN-propagate without spurious warning).

P3 (Documentation — public docs missing effective-PSU restriction):
README.md, diff_diff/guides/llms.txt, and docs/api/spillover.rst now
mention the effective-PSU requirement for `conley_lag_cutoff > 0`
(weights-only / strata-only without cluster fallback raises
NotImplementedError). REGISTRY's Restrictions list already had the
caveat; the public-surface docs are now consistent.

P3 (Wording — single-stratum reduction overclaim): REGISTRY and CHANGELOG
single-stratum reduction descriptions clarified to "Conley sandwich on
within-stratum-CENTERED PSU totals" (NOT plain). The Binder TSL centering
is retained at H=1 — under survey weights the per-period stratum mean is
always subtracted from PSU scores before the kernel application, even
when the stratum is the entire sample. The cross-sectional Wave E.2
reduction wording was already correct (centered).

P3 (Documentation — test_a `full meat matrix` overclaim): tightened
test_a docstring to accurately describe what is pinned (ATT AND scalar
SE bit-identity via assert_array_equal), with an explicit note that
full meat-matrix equality is implied — not directly asserted — because
the meat matrix is not exposed on `SpilloverDiDResults`.

P3 (Tech debt — code duplication, DEFERRED): the serial Bartlett kernel
logic is duplicated between two_stage.py (survey path) and conley.py
(no-survey path). Factoring out a shared kernel helper + shared PSD/finite
guard is cosmetic and out of scope for this PR. Added a TODO.md row to
track the refactor follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fbdfe5598cb4449a247d956c5feecef531692044


Overall Assessment

✅ Looks good

Executive Summary

  • Affected method: SpilloverDiD’s Wave E.2 follow-up survey-Conley variance path for vcov_type="conley", conley_lag_cutoff > 0.
  • Cross-checking the changed implementation against docs/methodology/REGISTRY.md and the local Conley/Newey-West/Binder/Gerber/Gardner references shows the core methodology is aligned: centered serial scores, panel-wide FPC on the serial term, post-injection effective-PSU gating, and the spatial+serial panel-block composition all match the documented contract.
  • The prior P2 finding on the missing combined-meat finite/PSD warning surface is resolved by the final guard in diff_diff/two_stage.py:L841-L869.
  • The prior public-doc caveat gap around the effective-PSU requirement is resolved in README.md:L109-L109, diff_diff/guides/llms.txt:L61-L61, and docs/api/spillover.rst:L329-L405.
  • Remaining issues are minor documentation/test-wording drift only; I found no unmitigated P0/P1 defects in the diff.
  • Execution note: static review only; I could not run pytest in this environment because pytest is not installed.

Methodology

No findings. The new survey lag>0 path in diff_diff/spillover.py:L3089-L3131 and diff_diff/two_stage.py:L549-L1143 is consistent with the registry contract at docs/methodology/REGISTRY.md:L3279-L3305. The load-bearing deviations from the no-survey reference are explicitly documented Notes, so they are not defects.

Code Quality

No findings.

Performance

No findings.

Maintainability

  • Severity: P3-informational. Impact: the serial Bartlett kernel logic and final meat-validation logic are still duplicated between diff_diff/two_stage.py:L841-L869, diff_diff/two_stage.py:L1097-L1143, and diff_diff/conley.py:L949-L990, so the survey and no-survey panel-block paths can still drift. Concrete fix: none required for approval; this is now explicitly tracked in TODO.md:L142-L142.

Tech Debt

  • Severity: P3-informational. Impact: the lag>0 no-effective-PSU survey surface remains unsupported, but the PR now fail-closes it in diff_diff/spillover.py:L3089-L3131 and tracks it in TODO.md:L143-L143, so there is no silent correctness bug. Concrete fix: none required for approval; keep the current NotImplementedError until a derived no-effective-PSU serial allocator exists.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: a few surfaces still overstate or underspecify the exact shipped contract. docs/api/spillover.rst:L377-L379 still describes the H=1 lag>0 reduction as plain Newey-West on PSU score sequences, while the implemented and registry-pinned contract is centered PSU totals (docs/methodology/REGISTRY.md:L3289-L3293, diff_diff/two_stage.py:L1035-L1058, diff_diff/two_stage.py:L1124-L1140). diff_diff/two_stage.py:L135-L145 also says the path “requires survey_design.psu set,” even though cluster=<col> injection is a supported effective-PSU route. Separately, CHANGELOG.md:L13-L13 and tests/test_spillover.py:L6452-L6497 still describe a “full meat matrix” regression pin even though the regression only checks ATT and scalar SE. Concrete fix: align the API page and in-code docstring with the centered-score/effective-PSU contract in REGISTRY.md, and reword the changelog/test text to say ATT/SE regression pin unless the meat matrix is actually exposed and asserted.

CI Codex R2 verdict: ✅ Looks good (0 P0/P1/P2). One actionable P3 (docs
wording drift across 4 surfaces).

Fixes:

- docs/api/spillover.rst H=1 lag>0 reduction: "plain Newey-West Bartlett
  HAC on per-PSU score sequences" -> "Newey-West Bartlett HAC on per-period
  within-stratum-CENTERED per-PSU score sequences (NOT raw — Binder TSL
  centering is retained at H=1)". Now matches REGISTRY which was already
  updated in the previous push.

- diff_diff/two_stage.py _compute_gmm_corrected_meat docstring: "Requires
  survey_design.psu set" -> "Requires an effective PSU — either explicit
  survey_design.psu OR cluster=<col> injected as the effective PSU per
  Wave E.1's _inject_cluster_as_psu routing". Aligns the inline docstring
  with the actual post-resolution gate at spillover.py.

- CHANGELOG.md: removed "full meat matrix" overclaim. The lag=0 regression
  pin uses assert_array_equal on user-visible ATT + scalar SE (the meat
  matrix is not exposed on SpilloverDiDResults); the test_a2 mock-spy
  independently asserts the serial helper isn't invoked at lag=0.

- tests/test_spillover.py class docstring: same "full meat matrix" wording
  trimmed to "ATT and scalar SE bit-identity (test_a) plus mock-spy that
  the serial helper isn't invoked (test_a2)".

P3 informational items NOT addressed (out of scope for this PR per the
recurring codex assessment):
- Serial Bartlett kernel duplication between two_stage.py + conley.py
  (TODO.md row added in previous push; refactor would touch the no-survey
  panel-block path)
- no-effective-PSU survey + lag>0 fail-closed restriction (TODO.md row
  exists; methodology derivation pending)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b5b44435a8fe02fa88779b0fe88576c2a1ea67f


Overall Assessment

✅ Looks good

Executive Summary

  • Affected method: SpilloverDiD’s survey-Conley variance path for vcov_type="conley", conley_lag_cutoff > 0.
  • I cross-checked the new panel-block survey path in diff_diff/two_stage.py:L553-L1147 and the post-resolution gating in diff_diff/spillover.py:L3089-L3131 against docs/methodology/REGISTRY.md:L3279-L3305, the no-survey reference diff_diff/conley.py:L925-L990, and the stratified survey helper diff_diff/survey.py:L1987-L2052.
  • The load-bearing departures from the no-survey path are documented in the registry: centered serial scores, panel-wide serial FPC, and the effective-PSU requirement. I found no unmitigated P0/P1 methodology defects.
  • The prior public-doc caveat around the effective-PSU requirement is now resolved in README.md:L109-L109, diff_diff/guides/llms.txt:L61-L61, docs/api/spillover.rst:L329-L410, and docs/references.rst:L205-L207.
  • Remaining issues are P3 only: one lingering docs/tests overstatement about the lag=0 guarantee, plus tracked maintainability/tech-debt rows in TODO.md:L142-L143.
  • Static review only; I could not run pytest because the available Python interpreter in this environment does not have pytest or numpy.

Methodology

  • No findings. The new spatial+serial decomposition, effective-PSU fail-closed gate, centered serial scores, and panel-wide serial FPC all match the documented Wave E.2 follow-up contract in docs/methodology/REGISTRY.md:L3279-L3305; the documented deviations from the no-survey reference are explicitly labeled Notes/Restrictions, so they are not defects.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3-informational. Impact: the Bartlett serial-kernel construction and post-meat diagnostic logic are still duplicated between diff_diff/two_stage.py:L845-L873, diff_diff/two_stage.py:L878-L1147, and diff_diff/conley.py:L949-L990, so the survey and no-survey panel-block paths can drift. Concrete fix: factor out a shared serial-kernel helper and shared finite/PSD guard, as already tracked in TODO.md:L142-L142.

Tech Debt

  • Severity: P3-informational. Impact: vcov_type="conley" + conley_lag_cutoff > 0 + survey_design= still deliberately rejects no-effective-PSU designs at diff_diff/spillover.py:L3089-L3131; this is fail-closed and explicitly tracked, so it is not a blocker. Concrete fix: none required for approval; keep the current NotImplementedError until a derived no-effective-PSU allocator exists, per TODO.md:L143-L143.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the lag=0 regression narrative still overstates what is guaranteed. docs/api/spillover.rst:L372-L374 and docs/methodology/REGISTRY.md:L3290-L3291 say the orchestrator “early-returns,” and tests/test_spillover.py:L6486-L6498 still implies full meat-matrix equality, but the code only skips the serial helper and still runs the new combined-meat diagnostic guard at diff_diff/two_stage.py:L819-L873; ATT/SE are pinned, not the full meat matrix. Concrete fix: reword those docs/tests to “serial helper is not invoked; lag=0 ATT/SE regression is pinned,” or implement a true early return plus an exposed/asserted meat regression if that stronger claim is intended.
  • No further findings. The previously noted effective-PSU documentation gap appears resolved at README.md:L109-L109, diff_diff/guides/llms.txt:L61-L61, docs/api/spillover.rst:L329-L410, and docs/references.rst:L205-L207.

…rate

CI Codex R3 verdict: ✅ Looks good (0 P0/P1/P2). One actionable P3 (lag=0
narrative wording drift across 4 surfaces).

The "orchestrator early-returns" phrasing was technically inaccurate: at
lag=0 the orchestrator does NOT early-return. It still runs the spatial
per-period loop, the saturation NaN-fail guard, and the new combined-meat
PSD/finite guard. The lag=0 guarantee is narrower: the serial helper is
not INVOKED, so `meat_serial` contributes nothing to the combined meat.

Fixes:
- docs/api/spillover.rst H=1 lag>0 reduction note + lag=0 reduction line:
  "orchestrator early-returns before invoking the serial helper" ->
  "orchestrator skips the serial helper invocation when L = 0 so
  meat_serial does not contribute; the test_a2 mock-spy independently
  asserts the helper isn't invoked".
- docs/methodology/REGISTRY.md lag=0 reduction note: clarify that the
  spatial loop + saturation guard + new PSD/finite guard still run; the
  guarantee is on user-visible ATT + scalar SE, not on the meat matrix.
- tests/test_spillover.py test_a + test_a2 docstrings: reword
  "early-return" to "skips the serial helper invocation"; explicitly note
  that full meat-matrix equality is NOT asserted (only ATT + scalar SE).
- CHANGELOG.md: mirrored wording fix.

P3 informational items NOT addressed (both TODO-tracked):
- Serial Bartlett kernel duplication between two_stage.py + conley.py
- no-effective-PSU survey + lag>0 fail-closed restriction

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 82dda96f8b1a1bffcd5cc7174297f319b3ef5d8d


Overall Assessment

✅ Looks good

Executive Summary

  • Affected method: SpilloverDiD’s survey-Conley variance path for vcov_type="conley", conley_lag_cutoff > 0, implemented in diff_diff/two_stage.py:553-1147 with the post-resolution gate in diff_diff/spillover.py:3089-3131.
  • I cross-checked the new spatial+serial composition against the Methodology Registry at docs/methodology/REGISTRY.md:3279-3305, the no-survey panel-block reference at diff_diff/conley.py:925-990, and the existing survey spatial helper at diff_diff/survey.py:1987-2052.
  • The load-bearing departures from the no-survey path are documented in the registry: serial centering, panel-wide serial FPC, and the effective-PSU requirement. I found no unmitigated P0/P1 methodology defects.
  • The prior lag=0 wording/test overstatement appears resolved. The docs now describe skipping the serial helper rather than an early return, and the tests pin ATT/SE plus helper non-invocation rather than claiming exposed full-meat equality. See docs/api/spillover.rst:372-379, docs/methodology/REGISTRY.md:3289-3294, and tests/test_spillover.py:6484-6559.
  • Remaining issues are P3 only: duplicated serial-kernel/diagnostic logic already tracked in TODO.md:142-142, and the intentional no-effective-PSU restriction already tracked in TODO.md:143-143.
  • Static review only; I could not run the tests because the local Python environment here does not have pytest or numpy.

Methodology

  • No findings. Affected method: survey-weighted panel-block Conley for SpilloverDiD. The new meat_spatial + meat_serial construction, the post-resolution fail-closed gate for no-effective-PSU designs, the centered serial scores, and the panel-wide serial FPC all match the documented Wave E.2 follow-up contract in docs/methodology/REGISTRY.md:3279-3305 and the code paths in diff_diff/two_stage.py:553-1147. The deviations from the no-survey reference at diff_diff/conley.py:925-990 are explicitly labeled Notes/Restrictions, so they are not defects under the review rubric.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3-informational. Impact: the Bartlett serial-kernel construction and the post-meat finite/PSD diagnostics are now duplicated between the survey panel-block path and the no-survey panel-block path, which creates a drift risk if one side changes later. Concrete fix: factor a shared serial-kernel helper and shared post-meat diagnostic helper, as already tracked in TODO.md:142-142. References: diff_diff/two_stage.py:845-873, diff_diff/two_stage.py:1105-1144, diff_diff/conley.py:928-990, TODO.md:142-142.

Tech Debt

  • Severity: P3-informational. Impact: vcov_type="conley" + conley_lag_cutoff > 0 + survey_design= still intentionally rejects no-effective-PSU designs after survey resolution/injection. This is fail-closed, documented, and tracked, so it is not a blocker. Concrete fix: none required for approval; keep the current NotImplementedError until a no-effective-PSU serial derivation exists. References: diff_diff/spillover.py:3089-3131, docs/methodology/REGISTRY.md:3299-3305, TODO.md:143-143.

Security

  • No findings.

Documentation/Tests

  • No findings. The prior re-review P3 on lag=0 wording/testing looks addressed: docs now describe the actual helper-skip behavior, and tests pin the user-visible contract without overstating an early return. References: docs/api/spillover.rst:372-379, docs/methodology/REGISTRY.md:3289-3294, tests/test_spillover.py:6484-6559.
  • Static review only; I could not execute the added tests because the local Python environment lacks pytest and numpy.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 21, 2026
@igerber igerber merged commit 88a4362 into main May 21, 2026
33 of 34 checks passed
@igerber igerber deleted the spillover-conley-wave-e2-followup-lag branch May 21, 2026 12:19
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19):

Public-surface variance lifts:
- SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468)
- SpilloverDiD vcov_type=conley + survey_design via stratified-Conley
  on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477)
- SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472)
- WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475)

Methodology-review-tracker promotions (mostly docs/tests):
- PreTrendsPower R pretrends parity goldens (PR-C, igerber#471)
- HAD methodology-review-tracker promotion (igerber#473)
- ContinuousDiD methodology-review-tracker promotion (igerber#476)

All changes additive; bit-equal defaults preserved across the affected
estimators. No new estimators (patch-level per semver convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant